Model Evidence
   HOME

TheInfoList



OR:

A marginal likelihood is a
likelihood function The likelihood function (often simply called the likelihood) represents the probability of random variable realizations conditional on particular values of the statistical parameters. Thus, when evaluated on a given sample, the likelihood funct ...
that has been integrated over the
parameter space The parameter space is the space of possible parameter values that define a particular mathematical model, often a subset of finite-dimensional Euclidean space. Often the parameters are inputs of a function, in which case the technical term for the ...
. In
Bayesian statistics Bayesian statistics is a theory in the field of statistics based on the Bayesian interpretation of probability where probability expresses a ''degree of belief'' in an event. The degree of belief may be based on prior knowledge about the event, ...
, it represents the probability of generating the observed sample from a
prior Prior (or prioress) is an ecclesiastical title for a superior in some religious orders. The word is derived from the Latin for "earlier" or "first". Its earlier generic usage referred to any monastic superior. In abbeys, a prior would be l ...
and is therefore often referred to as model evidence or simply evidence.


Concept

Given a set of
independent identically distributed In probability theory and statistics, a collection of random variables is independent and identically distributed if each random variable has the same probability distribution as the others and all are mutually independent. This property is usual ...
data points \mathbf=(x_1,\ldots,x_n), where x_i \sim p(x, \theta) according to some
probability distribution In probability theory and statistics, a probability distribution is the mathematical function that gives the probabilities of occurrence of different possible outcomes for an experiment. It is a mathematical description of a random phenomenon i ...
parameterized by \theta, where \theta itself is a
random variable A random variable (also called random quantity, aleatory variable, or stochastic variable) is a mathematical formalization of a quantity or object which depends on random events. It is a mapping or a function from possible outcomes (e.g., the po ...
described by a distribution, i.e. \theta \sim p(\theta\mid\alpha), the marginal likelihood in general asks what the probability p(\mathbf\mid\alpha) is, where \theta has been marginalized out (integrated out): :p(\mathbf\mid\alpha) = \int_\theta p(\mathbf\mid\theta) \, p(\theta\mid\alpha)\ \operatorname\!\theta The above definition is phrased in the context of
Bayesian statistics Bayesian statistics is a theory in the field of statistics based on the Bayesian interpretation of probability where probability expresses a ''degree of belief'' in an event. The degree of belief may be based on prior knowledge about the event, ...
in which case p(\theta\mid\alpha) is called prior density and p(\mathbf\mid\theta) is the likelihood. The marginal likelihood quantifies the agreement between data and prior in a geometric sense made precise in de Carvalho et al. (2019). In classical (
frequentist Frequentist inference is a type of statistical inference based in frequentist probability, which treats “probability” in equivalent terms to “frequency” and draws conclusions from sample-data by means of emphasizing the frequency or pr ...
) statistics, the concept of marginal likelihood occurs instead in the context of a joint parameter \theta = (\psi,\lambda), where \psi is the actual parameter of interest, and \lambda is a non-interesting
nuisance parameter Nuisance (from archaic ''nocence'', through Fr. ''noisance'', ''nuisance'', from Lat. ''nocere'', "to hurt") is a common law tort. It means that which causes offence, annoyance, trouble or injury. A nuisance can be either public (also "common") ...
. If there exists a probability distribution for \lambda, it is often desirable to consider the likelihood function only in terms of \psi, by marginalizing out \lambda: :\mathcal(\psi;\mathbf) = p(\mathbf\mid\psi) = \int_\lambda p(\mathbf\mid\lambda,\psi) \, p(\lambda\mid\psi) \ \operatorname\!\lambda Unfortunately, marginal likelihoods are generally difficult to compute. Exact solutions are known for a small class of distributions, particularly when the marginalized-out parameter is the
conjugate prior In Bayesian probability theory, if the posterior distribution p(\theta \mid x) is in the same probability distribution family as the prior probability distribution p(\theta), the prior and posterior are then called conjugate distributions, and th ...
of the distribution of the data. In other cases, some kind of
numerical integration In analysis, numerical integration comprises a broad family of algorithms for calculating the numerical value of a definite integral, and by extension, the term is also sometimes used to describe the numerical solution of differential equations ...
method is needed, either a general method such as
Gaussian integration In numerical analysis, a quadrature rule is an approximation of the definite integral of a function, usually stated as a weighted sum of function values at specified points within the domain of integration. (See numerical integration for mor ...
or a
Monte Carlo method Monte Carlo methods, or Monte Carlo experiments, are a broad class of computational algorithms that rely on repeated random sampling to obtain numerical results. The underlying concept is to use randomness to solve problems that might be determi ...
, or a method specialized to statistical problems such as the
Laplace approximation In mathematics, Laplace's approximation fits an un-normalised Multivariate normal distribution, Gaussian approximation to a (twice differentiable) un-normalised target density. In Bayesian Inference, Bayesian statistical inference this is usefu ...
, Gibbs/
Metropolis A metropolis () is a large city or conurbation which is a significant economic, political, and cultural center for a country or region, and an important hub for regional or international connections, commerce, and communications. A big c ...
sampling, or the
EM algorithm EM, Em or em may refer to: Arts and entertainment Music * EM, the E major musical scale * Em, the E minor musical scale * Electronic music, music that employs electronic musical instruments and electronic music technology in its production * Ency ...
. It is also possible to apply the above considerations to a single random variable (data point) x, rather than a set of observations. In a Bayesian context, this is equivalent to the
prior predictive distribution Prior (or prioress) is an ecclesiastical title for a superior in some religious orders. The word is derived from the Latin for "earlier" or "first". Its earlier generic usage referred to any monastic superior. In abbeys, a prior would be low ...
of a data point.


Applications


Bayesian model comparison

In
Bayesian model comparison The Bayes factor is a ratio of two competing statistical models represented by their marginal likelihood, and is used to quantify the support for one model over the other. The models in questions can have a common set of parameters, such as a nul ...
, the marginalized variables \theta are parameters for a particular type of model, and the remaining variable M is the identity of the model itself. In this case, the marginalized likelihood is the probability of the data given the model type, not assuming any particular model parameters. Writing \theta for the model parameters, the marginal likelihood for the model ''M'' is : p(\mathbf\mid M) = \int p(\mathbf\mid\theta, M) \, p(\theta\mid M) \, \operatorname\!\theta It is in this context that the term ''model evidence'' is normally used. This quantity is important because the posterior odds ratio for a model ''M''1 against another model ''M''2 involves a ratio of marginal likelihoods, the so-called
Bayes factor The Bayes factor is a ratio of two competing statistical models represented by their marginal likelihood, and is used to quantify the support for one model over the other. The models in questions can have a common set of parameters, such as a nu ...
: : \frac = \frac \, \frac which can be stated schematically as :posterior
odds Odds provide a measure of the likelihood of a particular outcome. They are calculated as the ratio of the number of events that produce that outcome to the number that do not. Odds are commonly used in gambling and statistics. Odds also have ...
= prior odds ×
Bayes factor The Bayes factor is a ratio of two competing statistical models represented by their marginal likelihood, and is used to quantify the support for one model over the other. The models in questions can have a common set of parameters, such as a nu ...


See also

*
Empirical Bayes methods Empirical Bayes methods are procedures for statistical inference in which the prior probability distribution is estimated from the data. This approach stands in contrast to standard Bayesian methods, for which the prior distribution is fixed be ...
*
Lindley's paradox Lindley's paradox is a counterintuitive situation in statistics in which the Bayesian and frequentist approaches to a hypothesis testing problem give different results for certain choices of the prior distribution. The problem of the disagreement ...
*
Marginal probability In probability theory and statistics, the marginal distribution of a subset of a collection of random variables is the probability distribution of the variables contained in the subset. It gives the probabilities of various values of the variables ...
*
Bayesian information criterion In statistics, the Bayesian information criterion (BIC) or Schwarz information criterion (also SIC, SBC, SBIC) is a criterion for model selection among a finite set of models; models with lower BIC are generally preferred. It is based, in part, on ...


References

* Charles S. Bos. "A comparison of marginal likelihood computation methods". In W. Härdle and B. Ronz, editors, ''COMPSTAT 2002: Proceedings in Computational Statistics'', pp. 111–117. 2002. ''(Available as a preprint on the web

'' * de Carvalho, Miguel; Page, Garritt; Barney, Bradley (2019). "On the geometry of Bayesian inference". ''Bayesian Analysis''. 14 (4): 1013‒1036. ''(Available as a preprint on the web

'' * {{cite book , first=Ben , last=Lambert , chapter=The devil is in the denominator , pages=109–120 , title=A Student's Guide to Bayesian Statistics , location= , publisher=Sage , year=2018 , isbn=978-1-4739-1636-4
The on-line textbook: Information Theory, Inference, and Learning Algorithms
by
David J.C. MacKay Professor Sir David John Cameron MacKay (22 April 1967 – 14 April 2016) was a British physicist, mathematician, and academic. He was the Regius Professor of Engineering in the Department of Engineering at the University of Cambridge and fro ...
. Bayesian statistics